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ABSTRACT 


Course selection is a crucial and challenging problem that 
students have to face while navigating through an under- 
graduate degree program. The decisions they make shape 
their future in ways that they cannot conceive in advance. 
Available departmental sample degree plans are not person- 
alized for each student, and personal discussion time with 
an academic advisor is usually limited. Data-driven meth- 
ods supporting decision making have gained importance to 
empower student choices and scale advice to large cohorts. 
We propose Scholars Walk, a random-walk-based approach 
that captures the sequential relationships between the dif- 
ferent courses. Based on the “wisdom of the crowd” and the 
students’ prior courses, we recommend a short list of courses 
for next semester. Our experimental evaluation illustrates 
that Scholars Walk outperforms other collaborative filtering 
and popularity-based approaches. At the same time, our 
framework is very efficient, easily interpretable, while also 
being able to take into consideration important aspects of 
the educational domain. 


Keywords 
course recommendation, Markov chains, random walks, se- 
quential recommendation, higher education 


1. INTRODUCTION 


The general purpose of higher education is to offer programs, 
which will help learners to gain knowledge throughout their 
studies. Students enjoy a plethora of offerings. However, 
course selection can be “messy and unorganized” [3] as it 
depends on many factors that students need to consider. 
Students have to balance personal preferences (interests, ob- 
jectives, and career goals) and general education and degree 
program requirements. As a result, course selection can be 
a non-trivial task. 


Decisions can be made based on manual guides offered from 
each department, but these are not tailored to individual 
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cases [7] in a higher education setting. Personalized assis- 
tance can be given by academic advisers, however this is not 
scalable with large cohorts with thousands of students. The 
ratio of student to advisor may be very high [14], limiting 
the adviser-advisee discussion time. Additionally, college 
students take on average up to 20% more courses than re- 
quired [2]. Better advising can help alleviate these problems. 
We need predictive models that can be employed to enable 
strategic action and attain better results. In this paper, we 
focus on appropriately designing a course recommendation 
system (CRS) that could facilitate the conversation between 
advisors and students for future planning. 


There are several existing approaches to generate a set of 
courses to recommend for next semester. Their majority 
suggest courses based on either the constraints and require- 
ments that they satisfy or their expected grades. This paper 
introduces Scholars Walk, a random-walk based approach 
for the course selection problem. It describes a personal- 
ized model that takes advantage of the sequential nature of 
course selection. We assume that students’ choices for the 
next term depend on the courses they have taken so far. 
In our approach, we build a Markov chain for each degree 
program over the courses taken consecutively. Then, we per- 
form a random walk, starting from the courses that students 
took in the previous semester. We evaluated the proposed 
approach on a number of different departments with dif- 
ferent subjects and characteristics. Scholars Walk overall 
outperforms other competing approaches in all the metrics 
considered in this paper. 


2. RELATED WORK 


Recommender systems have been broadly applied within the 
context of student learning [16]. We will further review the 
different approaches developed to help students select a sub- 
set of courses to register for an upcoming semester. The 
first course recommender systems are based on constraint 
satisfaction [22]. The sequence-based recommender [24] also 
considers complex constraints to improve the expected time- 
to-degree and GPA. A related body of work involves mining 
of association rules. Al-Badarenah et al. [1] cluster the stu- 
dents based on their grades first. Nguyen et al. [18] apply 
sequential rule mining in (course, grade) pairs and recom- 
mend the courses with the best performance. A different 
CSR was proposed by Esteban et al. [10], where there is 
available information about students’ satisfaction after tak- 
ing a course. 
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Table 1: Statistics for each major. 


Major | n| m | grades | %pop | flex 
Accounting 846 | 53 | 22,524 45.9 | 0.28 
Aerospace Engr 532 | 109 | 16,259 25.7 | 0.10 


Biology 1,275 | 146 | 28,084 14.9 | 0.11 
Biol Society Env 709 57 | 14,597 31.4 | 0.31 
Biomedical Engr 644 | 131 | 19,748 23.8 | 0.16 


Chemical Engr 826 | 108 | 24,825 26.3 | 0.11 
Chemistry 724 | 145 | 18,292 17.4 | 0.14 
Civil Engr 651 | 112 | 19,189 26.5 | 0.12 
Communication 1,333 95 | 22,421 15.4 | 0.19 
Computer Sc 998 | 161 | 24,899 13.7 | 0.11 
Electrical Engr 740 | 164 | 22,191 17.1 | 0.12 
Elementary Ed 770 49 | 16,527 40.4 | 0.31 
English 1,176 | 153 | 17,736 9.5 | 0.11 
Finance 1,234 83 | 32,255 29.7 | 0.20 
Genetics Cell 680 93 | 15,385 23.1 | 0.19 
Journalism 2,306 | 100 | 40,519 17.1 | 0.20 
Kinesiology 1,176 | 164 | 33,622 14.8 | 0.16 
Marketing 1,291 69 | 29,901 30.8 | 0.20 
Mechanical Engr | 1,369 | 132 | 39,436 18.9 | 0.11 
Nursing 819 86 | 25,136 31.2 | 0.27 
Nutrition 554 87 | 15,591 29.7 | 0.19 
Political Science | 1,307 | 171 | 19,260 8.1 | 0.12 
Psychology 1,894 | 115 | 31,141 13.4 | 0.15 


n, m are the number of students and courses. 

%pop is the course popularity (percentage of students 
that took a course at least once). 

The last column (flex) is the degree flexibility. 


Recently, recurrent neural networks (RNNs) have been suc- 
cessfully applied within the educational domain. Long Short 
Term Memory (LSTM) networks have been used for grad- 
ing prediction [13, 20]. In terms of course recommenda- 
tion, a combination of LSTMs and skip-gram models has 
also developed to balance implicit and explicit student pref- 
erences [23]. Morsy et al. [17] have also used RNN to rec- 
ommend courses which are expected to help maintain or im- 
prove students’ GPA. Other approaches include a Markoy- 
based model [15], that represents the sequence of courses 
taken as a stochastic process. Garner et al. [11] build a co- 
enrollment network and extract features for a network-based 
structural model. Finally Elbadrawy et al. [9] propose us- 
ing the academic features to improve the recommendation 
performance. 


3. DOMAIN & DATASET 


This work focuses on the undergraduate students in a tradi- 
tional educational institution. We used a dataset from the 
University of Minnesota that spans more than 10 years. The 
A-F grading scale (A, A-, B+, B, B-, C+, C, C-, D+, D, F) 
is used. Courses in which a student receives less than a C- 
do not count toward satisfying degree requirements. 


We extracted the degree programs that have at least 500 
graduated students from 23 different majors. We only kept 
students that actually received their degree and had at least 
three consecutive semesters with valid courses. We selected 
the 40 most frequent courses and the courses that belonged 
to frequent subjects. A subject is considered frequent if stu- 


dents have taken at least three courses that belong to that 
subject on average. We removed instances without an A-F 
grade, and non-academic courses, like independent /directed 
study or field study. We did not consider offerings in the 
summer semester. As these are less common, they would 
distort the course sequence of students not enrolled in sum- 
mer. Basic statistics for each degree program are shown in 
Table 1. The average course popularity (Y%pop) for course i 
is the percentage of students that have taken 7 at least once 
during their studies. The degree flexibility (flex) is a mea- 
sure of how different are the course selections that students 
make. It is one minus the average Jaccard similarity coef- 
ficient for every pair of students. The Jaccard similarity is 
computed as the number of courses that two students have 
in common divided by the minimum courses that student 
has taken them. 


4. PROPOSED METHOD 
4.1 Assumptions & Notation 


In the context of course recommendation for higher educa- 
tion, we make the following assumptions: 


1. Time is discrete and moves in steps, from one semester 
to the next. 

2. There is a relative ordering of the courses in terms of 
course levels, difficulty or material covered. 

3. Learning is seldom non-sequential; each course com- 
pleted provides some knowledge and experience that 
can be used in future courses. As a consequence, se- 
quence matters in course selection. 

4. In the absence of enough domain experts, the order 
in which courses are taken by students historically can 
reveal useful information on the curriculum and degree 
requirements. 

5. We know the number of courses that the student will 
take next semester. 


For the rest of the paper we will adopt the following no- 
tation. When we use the word target we will refer to the 
student/course/semester for which we want to generate re- 
commendation. Matrices are denoted with capital bold let- 
ters, while vectors are denoted with lower bold letters. Cal- 
ligraphic letters will be used for sets. 


The set of students is S and has size m. The set of all courses 
is denoted by C, |C| = n. Student 7 has an enrollment history 
H,, that is an ordered set of courses, {Cj,1,...,Cj,t,...,Cj,t;}; 
where C;,z is the set of courses taken in semester ¢ and ¢; is 
the last semester that the student took courses. Table 2 
presents the symbols we used. 


4.2 Building the Markov chain 


Markov models satisfy the Markov property, i.e., the condi- 
tional probability distribution of future states depends only 
on the current state. In the simplest Markov model, known 
as first-order, each state is formed by a single action, i.e., 
a student took a course. In the case of K-th-order models, 
the state-space will correspond to all possible sequences of 
K actions. As the available data could not adequately sup- 
port the number of states of higher-order chains, these mod- 
els would suffer from reduced coverage and possibly worse 
overall performance [6]. Therefore, we adopted a first-order 
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Table 2: Notation. 


n,m | number of courses, students 

i, i indexes for courses 

j, j’ | indexes for students 

t; number of semesters that 7 has taken courses 
t index for semesters 

C,S | set of courses, courses 

A set of states in Markov chain {Ai,...,An} 
Hj enrollment history of student j 

Cyt course that student 7 took in semester t 


T, F | matrices (n x n) 
Tr | the (k,l) element of matrix T 
Tk the k-th row of matrix T (1 x n) 


T.. | the /-th column of matrix T (n x 1) 
u personalization vector (1 x n) 
p’”) | state vector (1 x n) at timestep k 


Figure 1: Example: Anna’s enrollment history. 


3" Semester 


15 Semester 24 Semester 


CSCI2222 
A 


MATH1000 cscli111 


CSCl4444 


MATH1111 MATH2222 


s 
MATH3333 


Markov chain. We assume that the next-semester courses 
depend only on the courses that the student is taking the 
current semester. 


Markov models are represented by the parameters (A, T), 
where A is the set of states for which the Markov model is 
defined; and T is an (n x n) transition probability matrix 
(TPM), where n is the number of states (i.e., courses). In 
this context, state A; is associated with the fact that the 
student took the course 7. Each entry T;,;, corresponds to 
the probability of moving to state A, when the process is 
in state Aj, ie., taking course i’ after course i. Note that 
this matrix is not symmetric, ie., Ti, A Ty ;, as the order 
in which the courses are taken matters. 


Based on the historical enrollment information of the stu- 
dents, we first compute F, an (n x n) matrix that holds 
the counts of every pair of consecutive courses. Every pair 
of courses (i,i’) that a student has taken consecutively is 
used to estimate the entry Fj) ;, i.e., the frequency of the 
event that state A, follows the state A;. For example, con- 
sider student Anna in Fig. 1. The entry corresponding to 
the course pair of (MATH1000, CSCI1111) will be updated. 
Similarly, every line connecting two courses will equally con- 
tribute in the corresponding element of matrix F. 


After we compute the frequencies of matrix F, we need to 
normalize it to get T, a row stochastic matrix, so that the 
total transition probability from state 7 to any other state 


will sum up to 1: 


T; => F;/ . Fy 4, i F; i > 0. 
v=1 #/=1 


Additionally, it is possible that the sum of some rows to 
be zero. This occurs when a course is taken at the last 
semester of every student, so there are no courses after that 
to pair it with. In that case, we set the diagonal elements 
of the zero rows to one; T;,; = 1 and T;,,, = 0 for i # a’, if 
Soe F, 4 =— 0. 


4.3 Walking over courses 

We can view the Markov chain in the context of random 
walk on a course-to-course graph that is governed by the 
transition probability matrix. A random walk on a directed 
graph will form a path of vertices generated from a start 
vertex by selecting an edge, making a step by traversing the 
edge to a new vertex, and repeating the process [4]. This 
concept has been applied to many scientific fields. Closer to 
this work, random walks have recently been used for top- 
n item recommendation [19], and they are also known to 
empower systems used in production at major social media 
platforms [12, 8]. 


A random walk starts with any probability distribution u € 
R'*”, uz; is the probability of starting at vertex i. If one 
starts at a vertex i, then u; = 1, else uy = 0 for i’ F i. 
In our setting, the random walk for student j will equally 
start from any course in the student’s last semester, so the 
personalization vector will be: 


ap fr if 6 € City Ss 


otherwise. 


Let p’ € R’*” be a row vector with an element for each 
vertex specifying the probability of being there at time t. 
Before we start the walk, p° = u. After the first step, the 
probability of being at vertex 7’ is the sum over each adjacent 
vertex 7 of starting at 7 and taking the transition from 7 to 
i’. In matrix notation, when we are at state k and we take 
a step, we will get the following probability distribution: 


po Spl, (2) 


where the i-th entry of the p**? is the probability of the walk 
after k + 1 steps to land at vertex 7. This can be written as 
a function of the starting probability vector as: 


p*t! =uT*. (3) 


The probability of the walker to reach the vertices after K 
steps provides an intuitive measure that can be used to rank 
the courses and offer personalized recommendations to the 
student accordingly. 


Scholars Walk 


To introduce an additional way for personalization in our 
model, we perform a random walk with restarts [21]. We in- 
troduce a parameter a, 0 < a < 1 that controls if the walk 
will take the step described above, or if the walk will restart. 
In the latter case, we use the personalized probability dis- 
tribution as the restarting distribution. The probability dis- 
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Algorithm 1 SCHOLARS WALK 
Input: Model T, student’s personalization vector u, pa- 
rameters a, 3, number of steps K. 
Output: Recommendation vector p’”°. 
p?cu,ke0 
repeat 
ki k+1 
p* < ap*"'T+(1-—a)u 
p’ < p*/|[p*||1 
until ||p* — p*~"||2 < tol ork > K 
fori+1tondo 
pi < pi * pop; © 
end for 


> Take a step. 
> Normalize p*. 


> Penalize popular courses. 


rec 


p< p* 


tribution now is defined as: 
p*t' =ap*T + (1-—a)u 
= p*(aT + (1 - @)u) (4) 
= u(aT +(1—a)1u)*, 


where 1 is a column vector (n x 1) of ones. The product of 
lu will give us an (n x n) matrix where every row will have 
the probability that the walk will start at the corresponding 
course. Scholars Walk will perform a random walk governed 
by the matrix aT + (1—a)1u. 


The exact steps we followed are shown in Alg. 1. We can 
specify the number of steps to perform, or we can allow the 
algorithm to converge. If the number of steps is very small, 
the walk might not explore enough courses. If the number of 
steps is large, the walk might travel too far, and the recom- 
mendations might not be so relevant for the student. Addi- 
tionally, to limit the domination of popular courses, we pe- 
nalize the probabilities with the term pop; ? [5], where pop, 
is the popularity of the course. The parameter 3,0< 8 <1 
shows how harsh we need to be with the penalty term. 


Scholars Walk allows us to consider direct, as well as, tran- 
sitive relations between the courses. It also provides a con- 
siderable degree of personalization, in order to recommend 
courses that are relevant to each particular student. 


5. EXPERIMENTAL DESIGN 
5.1 Competing approaches 


The baselines are two group popularity approaches, on the 
department level (Pop1) and the academic level (Pop2) 
of the student measured by the number of years in the pro- 
gram [9]. For Pop1, we recommend the most popular courses 
in the major. For Pop2, we recommend the most common 
courses on the major and the academic level of the student 
(“freshmen”, “sophomores”, “juniors”, and “seniors”). Stu- 
dents after their forth year are considered seniors. 


We also compared against Basic Markov model (Markov) 
and Basic Markov model with skip (MarkovSkip) [15]. In 
these models, for a target student, the set of courses that 
other students have taken after taking a course that the 
target student took are the possible courses to recommend. 
We consider the combination of courses during the last two 


semesters to build and test the model. Each course is as- 
signed a recommendation score that is the sum of all the 
conditional probabilities that lead to that course starting 
from the student’s enrollment in the last semester. While 
the counts used in this case are the same with the ones 
computed in our matrix F, the conditional probabilities are 
computed differently. In order to produce recommendations 
for students whose set of prior courses did not have a match, 
the skip model was introduced. In that case, we find other 
students that have similar course history with the target 
student, and weight their corresponding probabilities by a 
parameter X. 


Last, we train an LSTM-based course prediction model sim- 
ilar to [17, 23]. LSTMs can learn temporal dependencies 
with additional gates to retain and forget selected informa- 
tion. As input, we use a multi-hot representation of course 
enrollments per semester which are mapped to a predicted 
sequence of vectors. Once the LSTM has been learnt, we 
feed the network with a binary vector that indicates the 
courses that the target student has taken the past semester. 
The weights at the output of the model are used to rank the 
courses. 


5.2 Evaluation metrics 

Like in prior work [9, 15, 17, 23], we used Recall@n, as 
the primary evaluation metric for the predictions, where 
ns is the number of courses that the student took in the 
target semester. This is the percentage of actual enrolled 
courses that were contained in the recommendation list. The 
reported metrics are averaged out across all students pre- 
dicted. Note that recall and precision are equivalent in our 
setting, since we recommend exactly as many courses as the 
student will take the upcoming semester. 


We also compute the percentage of queries for which we were 
able to retrieve at least one of the courses that the student 
took in the target semester (Y%rel). It measures for how 
many cases we were able to recommend at least one course 
that was relevant. 


5.3. Experimental setting 

Model selection. Using the dataset described in Sect. 3, 
we split it into train, validation and test sets as follows. 
All semesters before 2013 (about 10 years) were used for 
training, courses taken during 2013 and in Spring 2014 were 
used for validation, and courses taken afterwards (Fall 2014 
to Spring 2017) were used for test purposes, to report the 
results. The training set was used for building the mod- 
els, whereas the validation set was used to select the best 
performing parameters in terms of the highest Recall@n,. 
Based on the best set of parameters for the validation set, 
we computed the test set results in Sect. 6. 


Parameters. For parameter a, we tried the following set of 
values: {le-4, le-3, le-2, le-1, 0.2, 0.4, 0.6, 0.7, 0.8, 0.85, 
0.9, 0.99, 0.999}. For parameter 3, we tested values from 
0 to 0.8, in increments of 0.025. In terms of the number of 
steps that we allowed for our walker, we tested the values 1, 
3, and 1000. The last value corresponds to no limit for the 
number of steps. 


Additional filtering. We build a different model for each 
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Table 3: Results for Scholars Walk w.r.t. K. 


K | Recall@n, | %rel | a | B | avg#tsteps 
1 0.466 | 75.1 | 0.955 | 0.047 1 
3 0.460 | 74.6 | 0.088 | 0.053 1.95 


1000 0.461 | 74.6 | 0.075 | 0.051 2.32 


K is the number of steps that we allow to our walker. 
a, 8 columns show the average values of these param- 
eters over the models of all the majors. 

The last column shows the actual average number of 
steps the Scholars Walk made before convergence. 


Table 4: Performance comparison. 


Model | Recall@n, | %rel 
Pop1l 0.336 | 62.5 
Pop2 0.338 | 64.6 
Markov 0.456 | 73.0 
MarkovSkip 0.400 | 69.6 
LSTM 0.406 | 69.6 


Scholars Walk 0.466 | 75.1 


major for all the approaches we tested. After we generate 
a ranked list of the courses using any method, we filter out 
courses that are not offered the target semester. We also 
remove courses that the student has taken in the past and 
achieved a grade above C-, as they do not count towards any 
degree requirements, as mentioned in Sect. 3. In the end, we 
return a list with as many recommendations as the number 
of courses, ns, that the student took next semester, based 
on assumption 5. 


6. RESULTS 


In this section, we will try to answer the following questions: 
1) How do the parameters in our models affect the overall 
performance? Specifically, how does the number of steps 
affect recommendation performance? 2) What is the per- 
formance of our approach compared to the state-of-the-art 
approaches? 


6.1 The effect of the number of steps 

The performance of our models in terms of the metrics com- 
puted for different values of K is shown in 3. For each 
model and selection of K, we see the values of the parame- 
ters a and £ that were used. These parameters were selected 
based on the recall on the validation set. The parameter a 
controls the restarting probabilities, while 6 is used to re- 
weight the probability distribution before recommending its 
highest-weighted courses. The column avg#steps shows the 
average number of steps that the Scholars Walk actually 
made before convergence. 


In this domain, we need only a few steps, as we can under- 
stand from Table 3: not only when we set kK = 1 we get the 
best performance, but also, when we allow the walk to take 
many steps, the parameter a gets smaller values. This forces 
the walk to go back to the student’s personalized starting 
vector with higher probability, indicating that the starting 
distribution is very important. Additionally, even if we do 
not put any constraints in K, the number of steps that the 


Scholars Walk takes is quite small. There is a small increase 
when increasing Kk from 1 to 3, but after that, the number 
of steps actually taken is not that high. 


It is worth pointing out that, while setting K = 1 gives us 
the best overall performance, this is not the case for all the 
departments. The right value for K depends on the dataset 
used. In our data, there are four departments that need 
these extra steps. We observed that these departments have 
low average course popularity, which is average percentage 
of students that have taken a course at least once at some 
point during their studies, over all the courses. The aver- 
age value for the departments with K > 1 was 16.7 + 9.7%, 
while for the rest of the models the corresponding number 
is 24.1+ 7.2%. A stronger signal is present in the metric of 
the degree flexibility, which is the average Jaccard distance 
between the courses that any pair of students took, as de- 
fined in the end of Sect. 3. The departments with kK > 1 
have 0.118 + 0.005 degree flexibility against 0.184 + 0.066 
of the rest of the departments. This is an indicator that 
for stricter degrees, the walk depends on the extra steps to 
explore more courses. In these departments, students will 
take overall very similar sets of courses. On the other hand, 
if the degree program offers more freedom to the students, 
they select a wider range of courses, and there are more 
connections within courses. 


6.2 Performance comparison 

By comparing the best Scholars Walk model against five 
competing approaches, we get the results on Table 4. Our 
model performs the best, both in terms of recall, and in the 
percentage of cases for which it manages to be return some 
relevant recommendations. 


Popularity approaches are having considerably satisfactory 
performance. However, specifying the academic level of the 
student does not help much. They can recommend rele- 
vant courses to more than 60% of the cases. The two Ba- 
sic Markov models have quite different performance. The 
Markov model with skips performs poorly, compared to the 
Basic model. Additionally, it is worth mentioning that the 
Skip model was performing better and better as the param- 
eter \ was getting smaller. The weight of the cases that 
do not completely match the target student’s history, have 
as weight a power of A. Consequently, when 4 — 0, the 
Skip model becomes the Basic Model. For that reason, the 
smaller value of that we report results for, is 0.4. 


While comparing the Basic Markov model with Scholars 
Walk, it may seem that they have similar performance. How- 
ever, that might be misleading, as the Basic Markov model 
utilizes longer course enrollment history than the Scholars 
Walk. It looks back two semesters on the student’s courses, 
which corresponds to a second-order Markov chain. More- 
over, the model uses data from two semesters not only for 
computing the associated probabilities, but also to make pre- 
dictions. This leads to increased complexity because of the 
larger state-space with no benefit in recommendation qual- 
ity. In the same boat are the LSTMs as well. Their increased 
complexity might lead to the overfitting of the model, when 
the data are not sufficient for training. Our approach, which 
is a first-order Markov chain, manages to perform better 
than the higher-order models and LSTMs. 
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Scholars Walk can accurately predict the course selection 
of the students, by taking advantage of the “breadth and 
depth” of the data. In terms of time complexity, once we 
build the transition probability matrix, walking through the 
courses is trivial. As a result, it scales well with the number 
of students, while providing them personalized recommen- 
dations. At the same time, it is a white-box model, where 
the recommendations are easily explainable. 


7. CONCLUSION 


In this paper we propose Scholars Walk, a novel method 
designed to harvest the sequential patterns arising from past 
course enrollment data in order to recommend a short list of 
personalized course suggestions for the next semester. The 
proposed method relies on a random walk-based scheme on a 
course-to-course graph and personalization is achieved by a 
student-adapted starting distribution reflecting the current 
student’s enrollments. When compared with five competing 
models, from popularity-based to LSTMs and Basic Markov 
models, Scholars Walk achieves the best performance. It 
manages to be a successful, scalable approach that provides 
personalized recommendations for every student. 


8. ACKNOWLEDGMENTS 

This work was supported in part by NSF (1447788, 1704074, 
1757916, 1834251), Army Research Office (W911NF 1810344), 
Intel Corp, and the Digital Technology Center at the Uni- 
versity of Minnesota. Access to research and computing fa- 
cilities was provided by the Digital Technology Center and 
the Minnesota Supercomputing Institute. 


9. REFERENCES 
[1] A. Al-Badarenah and J. Alsakran. An automated 
recommender system for course selection. Intl. Journal 
of Advanced Computer Science and Applications, 
7(3):1166-1175, 2016. 

2] C. C. America. Time is the enemy, 2011. 

3] E. Babad and A. Tayeb. Experimental analysis of 

students’ course selection. British Journal of 

Educational Psychology, 73(3):373-393, 2003. 

4] A. Blum, J. Hopcroft, and R. Kannan. Random walks 

and markov chains. In Foundations of data science. 

Vorabversion eines Lehrbuchs, 2016. 

5] F. Christoffel, B. Paudel, C. Newell, and A. Bernstein. 
Blockbusters and wallflowers: Accurate, diverse, and 
scalable recommendations with random walks. In 9th 
ACM Conf. on Recommender Systems, pages 163-170, 
New York, NY, USA, 2015. ACM. 

[6] M. Deshpande and G. Karypis. Selective markov 
models for predicting web page accesses. ACM Trans. 
on Internet technology (TOIT), 4(2):163-184, 2004. 

[7] A. Diamond, J. Roberts, T. Vorley, G. Birkin, 

J. Evans, J. Sheen, and T. Nathwani. Uk review of the 
provision of information about higher education: 

advisory study and literature review: report to the uk 
higher education funding bodies by cfe research. 2014. 

[8] C. Eksombatchai, P. Jindal, J. Z. Liu, Y. Liu, 

R. Sharma, C. Sugnet, M. Ulrich, and J. Leskovec. 
Pixie: A system for recommending 3+ billion items to 
200+ million users in real-time. In World Wide Web 
Conf., pages 1775-1784, 2018. 


[13] 


[14] 


[15] 


[16] 


[17] 


[18] 


[19] 


[20] 


[21] 


[22] 


[23] 


[24] 


A. Elbadrawy and G. Karypis. Domain-aware grade 
prediction and top-n course recommendation. In 10th 
ACM Conf. on RecSys, pages 183-190, 2016. 

A. Esteban, A. Z. Gémez, and C. Romero. A hybrid 
multi-criteria approach using a genetic algorithm for 
recommending courses to university students. In 11th 
Intl. Conf. on Educational Data Mining, 2018. 

J. P. Gardner, C. Brooks, and W. Li. Learn from your 
(markov) neighbor: Coenrollment, assortativity, and 
grade prediction in undergraduate courses. Journal of 
Learning Analytics, 5(3):42-59, 2018. 

P. Gupta, A. Goel, J. Lin, A. Sharma, D. Wang, and 
R. Zadeh. Wtf: The who to follow service at twitter. 
In 22Nd Intl. Conf. on World Wide Web, pages 
505-514, New York, NY, USA, 2013. ACM. 

Q. Hu and H. Rangwala. Course-specific markovian 
models for grade prediction. In Pacific-Asia Conf. on 
Knowledge Discovery and Data Mining, pages 29-41. 
Springer, 2018. 

A. Kadlec, J. Immerwahr, and J. Gupta. Guided 
pathways to student success perspectives from indiana 
college students and advisors. New York: Public 
Agenda, 2014. 

E. S. Khorasani, Z. Zhenge, and J. Champaign. A 
markov chain collaborative filtering model for course 
enrollment recommendations. In Big Data (Big Data), 
IEEE Intl. Conf. on, pages 3484-3490. IEEE, 2016. 
N. Manouselis, H. Drachsler, R. Vuorikari, 

H. Hummel, and R. Koper. Recommender systems in 
technology enhanced learning. In Recommender 
systems handbook, pages 387-415. Springer, 2011. 

S. Morsy and G. Karypis. Learning course sequencing 
for course recommendation. 2018. 

H.-Q. Nguyen, T.-T. Pham, V. Vo, B. Vo, and T.-T. 
Quan. The predictive modeling for learning student 
results based on sequential rules. Intl. Journal of 
Innovative Computing, Information and Control 
(IJICIC), 14(6):2129-2140, 2018. 

A. N. Nikolakopoulos and G. Karypis. Recwalk: 
Nearly uncoupled random walks for top-n 
recommendation. In 12th ACM Intl. Conf. on Web 
Search and Data Mining, pages 150-158. ACM, 2019. 
F. Okubo, T. Yamashita, A. Shimada, and H. Ogata. 
A neural network approach for students’ performance 
prediction. In Seventh Intl. Learning Analytics & 
Knowledge Conf., pages 598-599. ACM, 2017. 

L. Page, S. Brin, R. Motwani, and T. Winograd. The 
pagerank citation ranking: Bringing order to the web. 
Technical report, Stanford InfoLab, 1999. 

A. Parameswaran, P. Venetis, and H. Garcia-Molina. 
Recommendation systems with complex constraints: 
A course recommendation perspective. ACM Trans. 
on Information Systems (TOIS), 29(4):20, 2011. 

Z. A. Pardos, Z. Fan, and W. Jiang. Connectionist 
recommendation in the wild: on the utility and 
scrutability of neural networks for personalized course 
guidance. User Modeling and User-Adapted 
Interaction, pages 1-39, 2019. 

C. Wong. Sequence based course recommender for 
personalized curriculum planning. In Intl. Conf. on 
Artificial Intelligence in Education, 2018. 


401 Proceedings of The 12th International Conference on Educational Data Mining (EDM 2019) 


